Rapid adaptation of n-gram language models using inter-word correlation for speech recognition

نویسندگان

Koki Sasaki

Hui Jiang

Keikichi Hirose

چکیده

In this paper, we study the fast adaptation problem of n-gram language model under the MAP estimation framework. We have proposed a heuristic method to explore inter-word correlation to accelerate MAP adaptation of n-gram model. According to their correlations, the occurrence of one word can be used to predict all other words in adaptation text. In this way, a large n-gram model can be efficiently adapted with a small amount of adaptation data. The proposed fast adaptation approach is evaluated in a Japanese newspaper corpus. We have observed a significant perplexity reduction even when we have only several hundred adaptation sentences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rapid Unsupervised Topic Adaptation – a Latent Semantic Approach

In open-domain language exploitation applications, a wide variety of topics with swift topic shifts has to be captured. Consequently, it is crucial to rapidly adapt all language components of a spoken language system. This thesis addresses unsupervised topic adaptation in both monolingual and crosslingual settings. For automatic speech recognition we rapidly adapt a language model on a source l...

متن کامل

Looking at alternatives within the framework of n-gram based language modeling for spontaneous speech recognition

This paper presents different methods using a weighted mixture of word and word-class language models in order to perform language model adaptation. A general language model is built from the whole training corpus, then several numbers of clusters are created according to a word co-occurrence measure and finally, word models as well as word-class models are built from each cluster. The general ...

متن کامل

Unsupervised Language Model Adapt Transcriptio

Unsupervised adaptation methods have been applied successfully to the acoustic models of speech recognition systems for some time. Relatively little work has been carried out in the area of unsupervised language model adaptation however. The work presented here uses the output of a speech recogniser to adapt the backoff n-gram language model used in the decoding process. We report results for t...

متن کامل

Effects of word string language models on noisy broadcast news speech recognition

In this paper, we present the results that our n-gram based word string language model, combined with speaker and noise adaptation of the acoustic model, improves recognition performance of noisy broadcast news speech. The focus was brought into a remedy against recognition errors of short words. The word string language models based on POS and n-gram frequency reduced deletion errors by 17%, i...

متن کامل

Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition

Traditionally, short-range Language Models (LMs) like the conventional n-gram models have been used for language model adaptation. Recent work has improved performance for such tasks using adapted long-span models like Recurrent Neural Network LMs (RNNLMs). With the first pass performed using a large background n-gram LM, the adapted RNNLMs are mostly used to rescore lattices or N-best lists, a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Rapid adaptation of n-gram language models using inter-word correlation for speech recognition

نویسندگان

چکیده

منابع مشابه

Rapid Unsupervised Topic Adaptation – a Latent Semantic Approach

Looking at alternatives within the framework of n-gram based language modeling for spontaneous speech recognition

Unsupervised Language Model Adapt Transcriptio

Effects of word string language models on noisy broadcast news speech recognition

Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition

عنوان ژورنال:

اشتراک گذاری